Africa Environmental Data Pipeline
Overview
The data processing targets contained in the predictor_data_processing_targets.R file download and process environmental and data for epidemiological modeling and forecasting across the African continent.
Data Sources
Static Data (Time-Invariant)
- Soil: Harmonized World Soil Database
- Terrain: Aspect (orientation) and slope categories
- Livestock: Gridded densities of cattle, sheep, and goats
- Elevation: Digital elevation model
- Climate: Long-term bioclimatic variables
- Land Cover: Land use types (trees, crops, water, etc.)
Dynamic Data (Time-Varying)
- Vegetation Indices
- Sentinel NDVI (2018-present, ~10-day intervals)
- MODIS NDVI (2005-present, ~16-day intervals)
- Weather Data
- NASA POWER: Temperature, humidity, precipitation
- ECMWF: Monthly forecasts up to 6 months ahead
Processing Steps
- Data Download: Fetches data from original sources or AWS cache
- Preprocessing: Standardizes all data to 0.1° spatial resolution
- Temporal Processing:
- Interpolates satellite data to daily intervals
- Calculates historical means for each day-of-year
- Anomaly Calculation: Computes deviations from historical baselines
- Forecast Processing: Creates anomalies for different lead times (0-30, 30-60, 60-90 days)
- Integration: Joins all data sources into unified parquet files
Infrastructure Features
- AWS Integration: Automatic upload/download of processed data to/from S3
- Parallel Processing: Processes multiple time periods simultaneously
- Error Resilience: Continues pipeline execution even if individual components fail
- Smart Caching: Avoids re-processing unchanged data
- Environment Controls: Flags for controlling overwrite and fetch behavior
- Reproducibility: Uses targets dependency management for consistent results
Output
The pipeline produces a comprehensive dataset with:
- Aligned spatial grid (0.1° resolution) covering Africa
- Daily temporal resolution
- Multiple environmental variables
- Historical anomalies and future forecasts
- Partitioned parquet files organized by date